Biclustering via sparse singular value decomposition.

نویسندگان

  • Mihee Lee
  • Haipeng Shen
  • Jianhua Z Huang
  • J S Marron
چکیده

Sparse singular value decomposition (SSVD) is proposed as a new exploratory analysis tool for biclustering or identifying interpretable row-column associations within high-dimensional data matrices. SSVD seeks a low-rank, checkerboard structured matrix approximation to data matrices. The desired checkerboard structure is achieved by forcing both the left- and right-singular vectors to be sparse, that is, having many zero entries. By interpreting singular vectors as regression coefficient vectors for certain linear regressions, sparsity-inducing regularization penalties are imposed to the least squares regression to produce sparse singular vectors. An efficient iterative algorithm is proposed for computing the sparse singular vectors, along with some discussion of penalty parameter selection. A lung cancer microarray dataset and a food nutrition dataset are used to illustrate SSVD as a biclustering method. SSVD is also compared with some existing biclustering methods using simulated datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

L0-norm Sparse Graph-regularized SVD for Biclustering

Learning the “blocking” structure is a central challenge for high dimensional data (e.g., gene expression data). In [Lee et al., 2010], a sparse singular value decomposition (SVD) has been used as a biclustering tool to achieve this goal. However, this model ignores the structural information between variables (e.g., gene interaction graph). Although typical graph-regularized norm can incorpora...

متن کامل

Robust biclustering by sparse singular value decomposition incorporating stability selection

MOTIVATION Over the past decade, several biclustering approaches have been published in the field of gene expression data analysis. Despite of huge diversity regarding the mathematical concepts of the different biclustering methods, many of them can be related to the singular value decomposition (SVD). Recently, a sparse SVD approach (SSVD) has been proposed to reveal biclusters in gene express...

متن کامل

Sparse Biclustering of Transposable Data.

We consider the task of simultaneously clustering the rows and columns of a large transposable data matrix. We assume that the matrix elements are normally distributed with a bicluster-specific mean term and a common variance, and perform biclustering by maximizing the corresponding log likelihood. We apply an ℓ1 penalty to the means of the biclusters in order to obtain sparse and interpretable...

متن کامل

Performance Evaluation of L1-norm-based Microarray Missing Value Imputation

l1-norm minimization was utilized in the imputation of microarray missing values, which is an important procedure in bioinformatics experiments. Two l1 approaches, based on the framework of local least squares (LLS) and iterative biclusterbased least squares (bicluster-iLLS) respectively, were employed. Imputed datasets of the l1 approaches were compared with those of traditional l2 methods. Th...

متن کامل

Minimax Localization of Structural Information in Large Noisy Matrices

We consider the problem of identifying a sparse set of relevant columns and rows in a large data matrix with highly corrupted entries. This problem of identifying groups from a collection of bipartite variables such as proteins and drugs, biological species and gene sequences, malware and signatures, etc is commonly referred to as biclustering or co-clustering. Despite its great practical relev...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Biometrics

دوره 66 4  شماره 

صفحات  -

تاریخ انتشار 2010